29 research outputs found

    Streamed Data Analysis Using Adaptable Bloom Filter

    Get PDF
    With the coming up of plethora of web applications and technologies like sensors, IoT, cloud computing, etc., the data generation resources have increased exponentially. Stream processing requires real time analytics of data in motion and that too in a single pass. This paper proposes a framework for hourly analysis of streamed data using Bloom filter, a probabilistic data structure where hashing is done by using a combination of double hashing and partition hashing; leading to less inter-hash function collision and decreased computational overhead. When size of incoming data is not known, use of Static Bloom filter leads to high collision rate if data flow is too much, and wastage of storage space if data is less. In such cases it is difficult to determine the optimal Bloom filter parameters (m, k) in advance, thus a target threshold for false positives (f_p) cannot be guaranteed. To accommodate the growing data size, one of the major requirements in Bloom filter is that filter size m should grow dynamically. For predicting the array size of Bloom filter Kalman filter has been used. It has been experimentally proved that proposed Adaptable Bloom Filter (ATBF) efficiently performs peak hour analysis, server utilization and reduces the time and space required for querying dynamic datasets

    FingerPrint Based Duplicate Detection in Streamed Data

    Get PDF
    In computing, duplicate data detection refers to identifying duplicate copies of repeating data. Identifying duplicate data items in streamed data and eliminating them before storing, is a complex job. This paper proposes a novel data structure for duplicate detection using a variant of stable Bloom filter named as FingerPrint Stable Bloom Filter (FP-SBF). The proposed approach uses counting Bloom filter with fingerprint bits along with an optimization mechanism for duplicate detection. FP-SBF uses d-left hashing which reduces the computational time and decreases the false positives as well as false negatives. FP-SBF can process unbounded data in single pass, using k hash functions, and successfully differentiate between duplicate and distinct elements in O(k+1) time, independent of the size of incoming data. The performance of FP-SBF has been compared with various Bloom Filters used for stream data duplication detection and it has been theoretically and experimentally proved that the proposed approach efficiently detects the duplicates in streaming data with less memory requirements

    Recommender System using Collaborative Filtering and Demographic Characteristics of Users

    Get PDF
    Recommender systems use variety of data mining techniques and algorithms to identify relevant preferences of items for users in a system out of available millions of choices. Recommender systems are classified into Collaborative filtering, Content-Based filtering, Knowledge-Based filtering and Hybrid filtering systems. The traditional recommender systems approaches are facing many challenges like data sparsity, cold start problem, scalability, synonymy, shilling attacks, gray sheep and black sheep problems. These problems consequently degrade the performance of recommender systems to a great extent. Among these cold start problem is one of the challenges which comes into scene when either a new user enters into a system or a new product arrives in catalogue. Both situations lead to difficulty in predicting user preferences due to non-availability of sufficient user rating history. The study proposes a new hybrid recommender system framework for solving new user cold-start problem by exploiting user demographic characteristics for finding similarity between new user and already existing users in the system. The efficiency of recommender systems can be improved by proposed approach which calculates recommendations for new user by predicting preferences within much smaller cluster rather than from the entire customer base. The analysis has been done using MovieLens dataset for enhancing the performance of online movie recommendation system. DOI: 10.17762/ijritcc2321-8169.15077

    An Efficient Algorithm and Architecture for Network Processors

    Get PDF
    A Buffer management algorithm plays an important role in determining the packet loss ratio in a computer network. Two types of packet buffer management algorithms, static and dynamic, can be used in a Network Interface Card (NIC) of a network terminal. In general, dynamic algorithms have better efficiency than the static algorithms. However, once the allocated buffer space is filled for an application, further incoming packets for that application get rejected. We propose a history-based scheme called History Based Dynamic Algorithm (HBDA), which reduces packet loss ratio by monitoring whether or not the application is active. For average network traffic loads, the HBDA improves the packet loss ratio by 15.9% and 11% (for load = 0.7) compared to DA and DADT, respectively. For heavy traffic load, improvement is 16.2% and 11.7% (for load = 0.7) and for actual traffic load improvement is 12.7% and 7.1% (for load = 0.7) over DA and DADT respectively. We also developed a new architecture for the Network Interface Card. The new architecture will support the multi-processor system and gives more consideration to the application with the highest priority. It has two control units for processing the incoming packets in parallel. For the traffic mix with average network traffic loads , the new architecture improves the packet loss ratio for priority application by a significant amount

    Packet Buffer Management for a High-Speed Network Interface Card

    No full text
    Packet buffers in a smart network interface card are managed in a way to reduce any packet losses from high-speed burst incoming data. Currently, two types of packet buffer management techniques are used. They are static buffer management and dynamic buffer management techniques. Dynamic buffer management techniques are more efficient than the static ones because they change the threshold value according to network traffic conditions. However, current dynamic techniques cannot adjust threshold instantaneously. Thus, packet losses with dynamic techniques are still high. We, therefore, propose a history-based buffer management scheme to address the issue. Our experiment results show that the history-based scheme reduces packet loss by 11% to 15.9% as compared to other conventional dynamic algorithms. Keywords -packet buffer; network interface card; layer 3 and 4 protocols; VHDL; static and dynamic buffer management
    corecore